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ABSTRACT 

We present a modified adaptive matched filter algorithm designed to iden- 
tify clusters of galaxies in wide-field imaging surveys such as the Sloan Digital 
Sky Survey. The cluster-finding technique is fully adaptive to imaging surveys 
with spectroscopic coverage, multicolor photometric redshifts, no redshift infor- 
mation at all, and any combination of these within one survey. It works with 
high efficiency in multi-band imaging surveys where photometric redshifts can 
be estimated with well-understood error distributions. Tests of the algorithm on 
realistic mock SDSS catalogs suggest that the detected sample is ~ 85% com- 
plete and over 90% pure for clusters with masses above 1.0 x 10 14 /?." 1 M Q and 
redshifts up to z = 0.45. The errors of estimated cluster redshifts from maxi- 
mum likelihood method are shown to be small (typically less that 0.01) over the 
whole redshift range with photometric redshift errors typical of those found in the 
Sloan survey. Inside the spherical radius corresponding to a galaxy overdensity 
of A = 200, we find the derived cluster richness A200 a roughly linear indicator 
of its virial mass M200, which well recovers the relation between total luminosity 
and cluster mass of the input simulation. 

Subject headings: cosmology: theory - galaxies:clusters:general - large-scale struc- 
ture of universe 
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Introduction 



Clusters of galaxies are the most massive virialized syste ms in the Universe and have 



been extensively used to study galaxy population and ev olution (IDressj 



19921 ). to trace the large-scale structure of t he universe (IBahcall 



and to con strain cosmology (jEvrardl Il989l ; IBahcall et al 



1999; 



1988 



cr 



Henry 



1984 



Dressier fc Gunn 



'ostman et al 



2000 



1992) 



Pierpaoli et al. 



200ll . 120031 ). Given the important roles clusters of galaxies play in the studies of both astro- 
physics and cosmology, tremendous efforts have been made during the past several decades 
to search for these systems. The first large samples of clusters were identified b y looking for 



projected galaxy overdensities throug h visual inspection of photographic plates (lAbelllll958 



Abell et al.lll989t IZwicky et al.lll968l ). These catalogs made pioneering contributions to our 
understanding of the extragalactic universe and since their generation have opened many new 
frontiers in the studies of galaxy clusters. However, the compilation of a relatively complete 
and pure sample of galaxy clusters has remained far from trivial. To date the Abell catalog, 
which contains about 4000 rich clusters to a redshift of z ~ 0.2, is still the most widely used 
cluster catalog in the field, though it was realized early that visually-constructed catalogs 
suffer from projection effects, subjectivity, and large uncertainties in estimated properties 



pi 

( ISutherlandl Il988l ) . It is difficult to apply these catalogs for statistical studies in cosmology 
because of these uncertainties, in addition to the fact that the selection function and false 
positive rates of such cluster samples are hard to quantify. 

To relieve some of these concerns, other approaches for identifying clusters have also 
been designed an d implemented, such as reconstructing the full 3 -D structures in com plete 



redshift surveys (jHuchra fc Geller 1982 ; Geller fc Huchra 1983 ; Ramella et al. 1997), de- 



tecting clusters in X-ray surveys (IGioia et al 



Rosati et al. 



1998; 



2000; Mohr et al. 



Romeret al. 2000; Scharf et al. 



1990 



Edge et al. 



2000 



1990 



Ebeling et al 



Bohringer et al 



2001 



2003; B ohringer et al. 12004), and utilizing the Sunyaev-Zeldovich effect ( Carlstrom et al. 



1998 



Mullis et al. 



20021 ; iPierpaoli et al.ll2005l ) and weak gravitational lensing (ISchneider 



1996 



Wittman et all200ll ) in search for clusters. Moreover, the realization of large and deep galaxy 
surveys in recent years has revived optical cluster-finding endeavors and prompted the devel- 
opment of more automated and rigorous algorithms to select clusters from imaging surveys. 
Using multi-color photometric data from which photometric redshifts can be estimated, it 
is now possible to mitigate the problems of projection effects, and quantitative analysis of 
the selection bias is also no w possible. Autom ated peak-finding techniques in optical cluster 
searches were attempted by IShectmanl (119851 ) and later used in the Edinburgh/Durham sur- 
vey (ED Lumsden et al. 19921) as well as the Automatic Plate Measurement Facility survey 



fAPM lDalton et al 



1994 



1997 ). In the const ruction of the cluster catalog from the Palomar 



Distant Cluster Survey (IPostman et al.lll996l ). a matched filter algorithm was developed to 



select clusters from a photometric galaxy sample. It was widely used in subsequent surveys 
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and several variants have been put forward (IKawasaki et al.lll998l; ISchuecker fc Boehringer 



19981 : iKepner et al.l Il999l : iKim et al.l l2002t IWhite fc Kochaneld |20oJ Meanwhile with the 



knowledge of the existence of the "E/SO ridgeline" of cluster galaxies in color-magnitude 
space and the aid of multi-co lor CCD photometry, several color-based cluster-finding tech- 



niques were also in vestigated (IGladders fc Yedl2000l ; iGoto et al.ll2002l ; Idadders fc Yeeil2005 



Miller et al.l 120051 ). Some of these have already been successfully app l ied to select cms 



ters from the Sloan Digital Sky Survey (SPSS) data (|Goto et al.l 120021 ; lAnnis et al.l 12002 
Bahcall et al1l2003l : iMiller et al.ll2005l : iKoester et alil2007h . 



The Sloan Digital Sky Survey (jYork et al.l 120001 ) is a five-band CCD imaging survey 
of about 10 4 deg 2 in the high latitude North Galactic Cap and a smaller deeper region in 
the South, followed by an extensive multi-fiber spectroscopic survey. The imaging survey 
is carried ou t in drift-scan mode i n five SPSS filter s (u, q, r, i, z) to a limiting magnitud e 
of r ~ 22.5 JFukurita et al.lll996l : bunn et alill998l : iLupton et al"Il200ll ; ISmith et al I l2002h . 
The spectroscopic su rvey targets ~10 6 galaxies to r ~ 17.7, with a median redshift of z ~ 0.1 



dStrauss et al.l I2002T) . and a sm aller deeper sample of ~10 5 Luminous Red Galaxies out to 
z ~ 0.5 (jEisenstein et al.l 1200 ll ). In this paper we discuss a modified adaptive matched filter 
technique incorporating several new features over previous algorithms and designed to detect 
clusters using both the SDSS imaging and spectroscopic data; it could readily be adapted 
to other similar multi-band, large-area galaxy surveys for construction of optically-selected 
cluster samples. It is the first of a series of papers that will explore the application of the 
technique to select clusters from the Sloan Digital Sky Survey. 

The general idea of the matched filter method relies on the fact that clusters show on 
average a typical density profil e, now widely assum ed to be the "NFW" form suggested first 
by Navarro, Frenk and White (jNavarro et al.lll996l ). Assuming that galaxies trace the dark 
matter, we expect galaxies within clusters to be distributed according to such profile. The 
algorithm selects regions in the sky where the distribution of galaxies corresponds to the 
projection of average cluster density profile. In addition, it is possible to specify the galaxy 
redshift information inside clusters, and to use prior knowledge on the galaxy luminosity 
function. The combination of these matched subfilters thus enables us to extract a quanti- 
tative signal corresponding to the existence of a cluster at a given location in the surveyed 
sky area. 

The modified matched filter technique presented in this paper can fully adapt to imag- 
ing surveys with spectroscopic measurements, multicolor photometric redshifts, no redshift 
information at all, and any combination of these within one survey. In the Sloan Digital 
Sky Survey where photometric redshifts can be estimated with well-understood error distri- 
butions from the five-band (u,g,r,i,z) multi-color photometry, the matched filter technique 
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described here utilizes not only the spectroscopic coverage for the bright main sample galax- 
ies and Luminous Red Galaxies (LRGs) but also the photometric redshift information for 
most of the galaxies detected in the imaging survey. This greatly expands the input galaxy 
sam ple to feed into th e cluster-finding algorithm compared to pure spectroscopic methods 
(e.g. iMiller et al.ll2005l ). The obtained composite cluster catalog can also go much deeper in 
redshift (z ~ 0.4 — 0.5 in this case) than the typical z ~ 0.2 limit for spectroscopic samples 
due to the lack of availability of spectroscopic measurements for faint, deep galaxies. 

Since the matched filter technique does not explicitly use the information a bout the red 



sequence to select clusters as is done in some color-based cluster-finding methods fjAnnis et al. 



20021 ; IMiller et al.ll2005l ; iKoester et al.l 120071 ). it can theoretically detect clusters of any type 
in color, and is not restricted only to old, red E/S0 galaxies. Such clusters likely dominate 
the cluster population, but may not constitute all of it especially as one probes systems of 
lower richness and at higher redshifts. The use of both spectroscopic and photometric red- 
shift information largely eliminates the projection effects and removes most of the phantom 
clusters. The matched filter also generates accurate quantitative estimates of derived cluster 
properties, such as redshift, scale, richness, and concentration, and produces quantitative 
detection likelihoods, indicative of the combined information for both red and blue galax- 
ies identified as cluster members. These facilitate further studies of detected systems and 
makes easier the comparison to clusters selected by other methods. One major concern for 
the matched filter technique is the fact that determination of these parameters depends on 
the specific cluster model we put in to build the relevant filters. However, these effects can 
be minimized by careful assumptions about the shape and evolution of luminosity function, 
and by the fact that our density filter is self-adaptive to different cluster scales and con- 
centration. The clusters selected by the algorithm will provide us the necessary sample on 
which we then apply an iterative procedure aimed at refining the constraints on clusters' 
properties. More details will be discussed in section §2] and subsequent work following this 
paper. 

The new algori t hm presented he re differs from previous matched filter implementations 
(IKepner et al.lll999l ; iKim et al.ll2002l ) in several ways. We u se a uniform Poisson likelihood 
analysis, which is only the second step in the approach by IKepner et al.l (Il999l ) following 
a first pass using Gaussian statistics for pre-selection of clusters. This avoids the common 
problem for high- redshift clusters of having too few galaxies in any cell of interest for Gaussian 
statistics to apply, and the adopted approach yields correct likelihoods even at the detection 
stage. In addition, both the core radius and virial radius of the matched filter are adaptive 
over the typical observed dynamical range for clusters, in contrast to most previous cluster- 
finding techniques that set the cluster core radius or search radius to be fixed. For each 
individual cluster, a best-fit core radius is found to maximize the likelihood match, as well 
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as an outer radius inside which the galaxy overdensity reaches A=200. The cluster richness 
is then normalized to be the light contained within this virial radius, which we find correlates 
better with the mass of gravitational systems whose extent is defined by density contrast as 
is widely adopted in theoretical studies. The new features of our modified algorithm will be 
further discussed in §2j 

In order to understand the biases and the selection functions of our algorithm, we test 
it on a mock SPSS cata log which has been constructed from the Hubble Volume Simula- 



tion (lEvrard et al.l 120021 ) by assigning luminosities and colors to the dark matter particles 
in a manner which reproduces many characteristics of the galaxy population from SDSS 
observations. The "observations" of the simulations have then been further modified so that 
the redshift scatter of those galaxies which have photometric but no spectroscopic redshifts 
correspond to that of the photometric redshift errors in actual SDSS data. The comparison 
of the detected cluster sample with halos in the simulation provides the only rigorous way 
to assess how the observed cluster properties relate to the real masses, and how the cluster 
sample can be used to derive cosmological constraints. 

In section §2] we describe the modified adaptive matched filter technique and how it is 
used to extract the cluster sample. Section §3] presents the basic features of the simulated 
catalog we adopted for the testing purpose. In section §@]we show results on the completeness 
and purity of our cluster sample, and the expected scaling relations inferred from runs on 
the simulations. We conclude in section §0 

A flat ACDM model with Q m = 0.3 and Qa = 0.7 is used throughout this work, and we 
assume a Hubble constant of H = lOO/i km s _1 Mpc -1 if not specified otherwise. 



2. The Cluster-Finding Algorithm 

The matched filter technique introduced here is a likelihood method which identifies 
clusters by convolving the optical galaxy survey with a set of filters based on a modeling of 
the cluster and field galaxy distributions. A cluster radial surface density profile, a galaxy 
luminosity function, and redshift information (when available) are used to construct filters 
in position, magnitude, and redshift space, from which a cluster likelihood map is generated. 
The peaks in the map thus correspond to candidate cluster centers where the matches be- 
tween the survey data and the cluster filters are optimized. The algorithm automatically 
provides the probability for the detection, best-fit estimates of cluster properties including 
redshift, radius and richness, as well as membership assessment for each galaxy. The modified 
algorithm can be fully adaptive to current and future galaxy surveys in 2-D (imaging), 2|-D 
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(where multi-color photometric redshifts and their errors can be estimated), and 3-D (with 
full spectroscopic redshift measurements). Usage of the apparent magnitudes and, where 
applicable, the redshift estimates instead of simply searching for projected galaxy overdensi- 
ties effectively suppresses the foreground-background contamination, and the technique has 
proven to be an efficient way of selecting clusters of galaxies from large multi-band optical 
surveys. 

In what follows, we first provide a general introduction on how the likelihood function 
is constructed and how we detect clusters with the matched filter method. This gives us an 
overview about how the cluster catalog is derived. Then we discuss in more detail the density 
models and subfilters used to construct the likelihood. More specifically, we assume an NFW 
density profile, a general Schechter luminosity function and a Gaussian model for BCGs to 
model clusters, and use the spectroscopic measurements and obtained error distributions 
of galaxy photometric redshifts from the Sloan Digital Sky Survey to incorporate redshift 
uncertainties. In the end we describe how to determine the set of best-fit parameters on 
cluster properties that maximize the likelihood at a given position over a range of redshift, 
scale, concentration, and richness. 



2.1. Likelihood Function 

The likelihood function used here is based on the assumption that the probability of 
finding galaxies in an infinitesimal bin in angular position, apparent magnitude and redshift 
space is given by a Poisson distribution. Under this assumption, the total likelihood of many 
of such bins, which we take to be center ed in the location of the galaxies in the survey, is 



see appendix C2 in IKepner et al.l (119991 ) for a full derivation): 



N c N g 

lnC = -N f -Y,N k + J2Hm], (1) 

k=l i=l 

where Nf is the total number of field galaxies expected within the searching area, N g 
is the total number of galaxies and Ylk=i is normalized to be the number of galaxies 
brighter than L* as members of the N c clusters assumed in the model. P(i) represents the 
predicted probability density of galaxies in a given bin, which includes both probabilities of 
field galaxies (Pf) and of cluster members (P c ), 



P(i) = P f (i) + Y,Pc(i,k). 

k=l 



(2) 
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These probabilities are the expected number densities for a given location and magni- 
tude. 

The cluster catalo g is constructed with an iterative procedure similar to the one used in 



Kochanek et al.l (120031 ). We start our process from a density model of a smooth background 
with no clusters. For each galaxy position, we then evaluate the likelihood increment we 
would obtain by assuming that there is in fact a cluster centered on that galaxy. The 
likelihood is then optimized by varying the cluster galaxy number Nk, the redshift and 
cluster scale length. At each iteration, we retain the cluster candidate which resulted in 
the greatest likelihood increase. We incorporate it in our density model and restart the 
procedure. The function for finding the k th cluster in the whole surveyed area therefore is 



Aln£(Jfc) = -N k + E&LM g^ggg^ ]- (3) 

A list of cluster candidates then becomes available in decreasing order of detection 
likelihoods. For each candidate one has derived properties, including best-fit position, scale, 
richness, and estimated redshift. The initial cluster catalog allows us to further inspect each 
individual candidate for exploration of substructure and better constraints on previously 
fitted quantities. 



2.2. Density Model 

As both field and cluster galaxies are found in the survey, the probability of finding a 
galaxy in a given bin depends on the density of both these populations (see eq.(2)). 

For galaxy i with angular position r-band apparent magnitude m[ and redshift Zi 
(when available), the background number density Pf(i) can be directly extracted from the 
global number counts of the galaxy survey, 



p '« = ^ m ^< (4 > 

and it has to be modified to account for the effects of galaxy redshift uncertainties if 
photometric redshift estimates are used. 

For cluster k located at 6 k with proper scale length r ck , redshift z k and galaxy number 
N k , the probability of galaxy i being a member of it, P c (i, k), is just the product of a surface 
density profile S c and a luminosity function <j) c at the cluster's redshift, times a distribution 
function f(zi — z^) that expresses redshift uncertainties: 
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P c (i, k) = N k S c [D A (z k )6 ik ] <\> c K - ] f(zi ~ z k ), (5) 

where T>(zk) is defined through 

Ml = ml - 51og( J D L (z fc )/10pc) - k(z k ) =m\- V(z k ), (6) 

and where D^(z k ) and Di{z k ) are the angular diameter and luminosity distance at the 
cluster's redshift z k , and k(z k ) is the /c-correction. The conversion of units in luminosity and 
distance is conducted by performing proper fc-corrections for galaxies of different spectral 
types and choosing the proper cosmology (see 



2.3. Subfilters 



Based on current observational studies as well as findings from dark matter halos, and 
for convenient comparisons to theoretical models widely used in analytical studies and N- 
body simulations, we assume the densi ty profile of galaxies within a cluster follows the form 
of a NFW profile jNavarro et al.lll996h . which in three dimensions is given by 



Pc{r) 



1 + 



(7) 



where c is the concentration parameter and F(c) is the typical normalization factor for 
galaxies inside the virial radius of the cluster, r v = cr c . The 3-D profile is then integrated 
along the line of sight to derive a projected surf ace density profile £ r (r) which is expressible as 
a much more complicated analytical form (see lBartelmannlll996l ). The profile is normalized 
so that J Q crc 2nrTi C {r)dr = 1. 

The search radius for galaxies belonging to the cluster is set to be the virial radius of 
the cluster, or more specifically here, the radius inside which the ma ss overdensity is 20 
times the critical density, i.e., 200fi^/ times the average background (jEvrard et al.ll2002l ). 
Since it is hard to directly measure the cluster mass overdensity in observations, we instead 
determine the virial radius inside which the space density of cluster galaxies is 200^/ 
times the mean field, assuming that the galaxy distribution in a halo traces the overall dark 
matter distribution (see discussions in 



in 


Hansen et al. 


20051). which has 


jeen 


Lin et al. 


2004; 


Naeai & Kravtsov 


2005 


ne measurements dsheldon et al. 


2004 


)• 1 



we use r 20 o throughout this work to denote the cluster virial radius determined by galaxy 
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overdensities. The cluster richness is then defined to be the total luminosity in units of L* 
inside r 20 o- 



As has been discussed before in matched-filter studies ( IPostman et al.lll996t iKim et al. 



20021 ) and also shown by our own numerical experiments, the efficiency of the filter is usu- 
ally much more sensitive to the overall filter cutoff radius than to the details of its shape. 
Therefore the determination of appropriate values for the scale length in the cluster model is 
of particular importance, as it may have significant impact on the detection efficiency of the 
cluster-finding algorithm. Most of the previous matched filter methods have used a carefully 
chosen fixed value for the model cluster cutoff radii, and they comp ute the galaxy number 
or the richness of clusters within such a fixed radius in physical units. IPostman et al.l (119961 ) 
concludes that a fixed search radius of 1 Mpc h~ l is a near-optima l choice in their ra dial 
filter, and this value has been also adopted by iKepner et al.l (119991 ): IKim et al.l (120021 ) in 
th eir method which as s umes a mo dified Plummer l a w mo del for the surface density profile. 
In I White fc Kochanek! (120021 ) and iKochanek et al.l (120031 ) , the authors set a fixed core ra- 
dius of r c = 200 kpc h^ 1 and concentration parameter of c = 4 for the NFW profile in the 
cluster detection and mass estimates. Although we find from observations and simulations 
that these choices are reasonable values for typical rich clusters, a single fixed scale length 
for all clusters over a wide range of masses and concentrations will certainly degrade the 
signal-to-noise ratio, bias detection probabilities, and be responsible for at least part of the 
large scatter observed in previous cluster mass-richness scaling relations. In our modified 
adaptive matched filter algorithm, we optimize the core radius for each individual cluster 
over the dynamical range for typical galaxy clusters. For the core radius value that maxi- 
mizes the likelihood, we then compute the normalized cluster richness according to the NFW 
profile with best-fit parameters within a cluster virial radius r2oo determined from galaxy 
overdensities. We believe this procedure is more similar to and comparable with the virial 
mass defined by density contrast in most theoretical studies and analyses of simulations. 

For the magnitude filter, we adopt a lum inosity profile d escribed by a central galaxy 
plus a standard Schechter luminosity function (jSchechterl Il976l ) 



l+a 



0(M) = = 0.4 In 10 n* (L ) oxp(— L/Z,' 1 ): 



the integrated luminosity function is 



/M 
(j)(M)dM = nT[l + a, L/L*]. 
-co 



(9) 
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Parameters for the global lumin osity function are ob tained from the SDSS spectroscopic 
sample at the redshift of z = 0.1 (IBlanton et al.ll2003l ). To account for the evolutionary 



magnitudes from z = to z = 0.5 


(Lovedav et al. 


19921; 


Lilly et al. 


1995bl; iNaeamine et al. 


2001; 


Blanton et al. 


2003; 


Lovedav 


2004; 


Baldrv et al. 


20051; 


Ilbert et al. 


l2005h. We assume 



th at L* does no t vary as a function of cluster richness, which is supported by the results 



of lHansen et al.l (120051 ) . Because the matched filter algorithm uses both a cluster galaxy 



luminosity function and a field galaxy lumi nosity function , which are expected to be different 



due to the morphology-density relation ([Dressie r 1980) and the observed dependence of 



lumin osity function on galaxy over-densities (jChristleinl |2000| ; iMo et all 120041 ; ICroton et al. 



20051 ) . it would be desirable to model these separately. It would also be desirable to further 



model the lum i nosity distributio ns according to galaxy spectral types (IFolkes et al.l 11999 



Lin et al.lll999l ; iHogg et al.ll2003l ). At this stage, however, only a single function is adopted 
since the work on precise luminosity functions for cluster galaxies of different types has just 
been started. We hope to investigate this further on the basis of the first catalog we produce. 
Once a cluster catalog is available for galaxies in all redshift ranges, we can go back and 
examine the impact of our assumptions about the galaxy luminosity functions as well as 
their evolution for different environments and spectral types. In order to use the same range 
in the luminosity function at all distances and therefore avoid bias associated with errors 
in the assumed form of the luminosity function, we cut off the luminosity function at one 
magnitude below L*. We can still calculate total luminosities by integrating the assumed 
form, and we use this in our richness calculation, described below. 

The existence of Brightest Cluster Galaxies (BCGs) near the cluster centers is incor- 
porated into our cluster galaxy luminosity model as a separate component from the main 
Schechter function for satellites, as this distinct i on has been clearly seen in clusters over 



a range of richness (ITremaine fc Richstond 119771 ; lHansen et al.l 120051 ) . We ass ume a Gaus- 



sian d istribution for the luminosities of these objects and adopt the results from lLin fc Mohr 



( 120041 ) for correlations between the BCG luminosity and host cluster properties. More specif- 
ically, the BCG luminosity is assumed to follow a single power law with the cluster richness, 



Zheng et al. 


2005; 


1 we icuvc me 

Hansen et al. 


2005) 



0.5 mag dLin fc Mohrl 12004 



same way as L* does, i. e. the luminosity at the mean of the gaussian has a constant ratio to 
L*. This is almost certainly incorrect in detail, but will be explored in follow-up work once 
the catalog is constructed. This modification of the general Schechter function enhances the 
detectability of typical clusters with BCGs, especially those at higher redshifts with only 
few galaxies other than the BCG to be included in the apparent magnitude-limited galaxy 
sample. 
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Thanks to the accurate five-band (u,g,r,i,z) multi-color photometry in the SP SS (I York et a 



2000), as well as the associated redshift surv ey for the bright main s ample galaxies (IStrauss et al 



20021 ) and Luminous Red Galaxies (LRGs, lEisenstein et al.ll200ll ). it is now also possible to 



retrieve redshift information for most of the galaxies that we are going to use in construction 
of the SDSS cluster catalog, either photometrically or spectroscopically. For real SDSS data 
currently available from DR5, we find that galaxies with valid photometric redshift estimates 
make up more than 96% of the whole sample in the imaging data, within which about 1%, 
mostly bright, red galaxies, have matched spectroscopic measurements from redshift surveys. 
Not surprisingly, the inclusion of galaxy redshift estimates greatly improves the accuracy of 
the cluster redshift determinations and significantly mitigates projection effects, thus allow- 
ing the detection of much poorer systems than possible in previous work with no redshift 
measurements. 

The uncertainties of galaxy redshifts are assumed to follow Gaussian distributions in 
the 2|-D and 3-D cases, where in terms of the f(z) function in equation (5) we have 

, . exp \—(zi - z k ) 2 /2a 2 ] . . 

f(z k ) = P L 1 1 k> 1 zA . (10) 

V 27TOV 



For galaxies with computed photometric redshifts (described below), we add to the 
cluster galaxy density model a third subfilter based on the distribution of derived redshift 
uncertainties in the form of a combination of multiple Gaussian modes. These error esti- 
mates are obtained by calibrating photometric redshifts with the real redshifts in the SDSS 
spectroscopic galaxy sample and redshifts for other fainter (but smaller) overlapping sur- 
ve ys. The analys i s is d one for red and blue galaxies separately using the color separator 



by IStrateva et al.l (120011 ). and it is found that a model using Gaussian modes with proper 
weights assigned generally provides a good description of the bias and scatter in the photo- 
metric redshifts for galaxies of both spectral types and in different apparent magnitude bins. 
Some of the results are shown in §3J 

In the 3-D case where spectroscopic redshifts of galaxies are measured, we smooth them 
in Gaussians with assigned cluster velocity dispersions that vary in the range from 400 km 
s _1 (proper) for poorer clusters to 1200 km s _1 (proper) for the richest systems in the selected 
cluster sample, according to several discrete estimated richness classes. The same procedure 
as outlined in the previous paragraph for photometric redshifts is applied to include this 
redshift filter in the galaxy density model. 

In addition, there are galaxies we find that either have invalid photometric redshifts 
computed or fall into the redshift and magnitude range where no good calibrations are 
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available. Such galaxies, which are currently about 3%-5% of the whole sample, are assumed 
to have no redshift estimates and therefore no constraining filter. Hence we set up for each 
galaxy the appropriate scenario that adapts the matched filter algorithm to galaxy redshift 
estimates with varied accuracy. 

Finally, of course, we fit an overall amplitude, which represents the cluster richness. 
Since its size, shape and redshift are all determined at this point, we can express the ampli- 
tude however we like in physical terms. We have chosen to use the total luminosity within r 2 oo 
expressed as a multiple of L* (evolved to the relevant redshift using 1.6 mags of luminosity 
evolution per unit redshift), which we denote as A 2 oo- 



2.4. Implementation 



Implementation of the matched filter algorithm starts with reading the galaxy catalog. 
For each galaxy i in the sample, we read in the positions a iy 5 iy the extinction-corrected five- 
band apparent magnitudes and their errors, and the redshift Zi if it has a matched spectrum. 
Using the flux and color info rmation, we com pute a photometric redshift estimate using a 
neural network technique by iLin et al.l (120061 ) as well as /c-corrections and estimated rest- 
frame colors for each galaxy, which we add as input to the cluster-finding algorithm. 

The next step is to define the cluster model we adopt for the filters, including the sur- 
face density profile S c (r), the luminosity function 0(M), and the assumed Gaussian modes 
of photometric redshift uncertainties. The field density model Pf(m,z) is constructed from 
global number counts of the surveyed background galaxy distributions as a function of mag- 
nitude and redshift, as shown in equation (4). We then incorporate these models into the 
Poisson likelihood functions as discussed above. 

To map the lik elihood distribution s of the surveyed area, we grid the sky using the 
Healpix package of iGorski et al.l (120051) whic h proy ides a useful hierarchical pixelization 
scheme of equal-area pixels. In iKepner et al.l (119991 ). the authors choose galaxy positions 
on an adaptive grid in calcul ating the likelihoods instead of the uniform grids used in the 
previous matched filter codes (jPostman et al.lll996l ). so that sufficient resolution in the high 
density regions is ensured while saving computational time and memory for less dense re- 
gions. We follow this procedure and evaluate the likelihood functions at every galaxy position 
to locate the peaks in the map as possible cluster centers. The cluster richness is optimized 
over the whole redshift range of our search at intervals that finally adapt to 5z = 0.001, and 
for a set of trial cluster scale radii (r c ) at 10 kpc h~ x steps. The derived quantities for best fit 
cluster richness, redshift and scale length thus correspond to the parameters that maximize 
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the likelihood function at the grid position or candidate cluster center. 



This algorithm possesses several new features. First, the cluster algorithm is fully 
adaptive to 2-D, 2|-D and 3-D case in the optical surveys, and can deal with data with 
these different attributes simultaneously. It can easily accommodate the galaxy redshifts 
with uncertainties in any forms and distributions, from purely single-band imaging data to 
a complete spectroscopic redshift survey, and works well for the intermediate case where 
photometric redshifts are estimated from multi-band color information. Projection effects 
from foreground-background contamination, which have been a long-standing problem for 
optically-selected clusters, are largely suppressed. This allows the detection of even poorer 
systems at high redshift, and shows great potential for current and future large, deeper 
surveys in the optical band. Second, the current adaptive matched filter used a single Poisson 



statist ics in the likelihood analysis, compared to the two-step approach in iKepner et al. 



( 119991 ) . which uses a "coarse" filter based on Gaussian likelihood for pre-selection of clusters. 
We write our code in Fortran-90 and by careful arrangement in computations and setting up 
the quick link search, the optimization of the Poisson likelihood through the whole process 
is now affordable in the sense of execution time a nd memory. For a survey field of ~ 300 
deg 2 , which is comparable to a typical SDSS stripe (jYork et al.ll2000l ). the modified adaptive 
matched filter algorithm requires around 900 megabytes of memory and takes about 30 
hours for a single run using one dual-processor node in a Linux Beowulf cluster with 3.06 
GHz clock speed each. With no assumption necessary about sufficiently many galaxies inside 
each virtual bin as is necessary in the Gaussian case, the Poisson statistics remains robust 
in the common situation where the re are too few galax ie s in e ach ce ll for Gaussian s t atistic s 
to apply. Third, as discussed in IWhite fc Kochanek! (120021 ) and iKochanek et al.l (120031 ). 
the current density model explicitly includes the effect of previously found clusters on the 
global likelihood function. The procedure automatically separates overlapping clusters and 
avoids multiple detections of the same system in the overdensity re gions, somewhat similar 
to th e CLEAN method used in radio astronomy to produce maps (iHogboml 11974 ; ISchwarz 
19781 ). We do not need to do extra cluster de-blending work afterwards. Finally, as discussed 
earlier, our approach to maximizing the likelihood differs from most previous cluster-finding 
techniques that choose a fixed cluster scale or search radius. We optimize the core radius for 
each individual cluster, and the cluster richness is computed within a virial radius which is 
determined from galaxy overdensities. This provides insights about the virial mass of such 
gravitational systems defined by density contrast and better corresponds to what is done in 
theoretical treatments. 
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3. Tests on Mock Galaxy Catalogs 

To evaluate the completeness and purity (false positive rate) of our cluster sample, as 
well as to assess the how well our measured cluster properties correspond to the properties 
of the underlying dark matter halos, we have run the matched-filter algorithm on a mock 
galaxy catalog generated from a realistic cosmological N-body simulation. Because of the 
large redshift range we are trying to probe, it is important to do this with as large a simulation 
volume as possible. In addition, because we seek here to test the behavior of our algorithm 
using a combination of spectroscopic and photometric redshift s, it is useful to have a realistic 
galaxy population in both clusters and the field, with luminosities, colors, and the relation 
between these quantities and environment that are a good match to SDSS data. Here 
we have used a mock catalog based on a meth od namely AD D GALS (Adding Density- 



Determined Galaxies to Lightcone Simulations) (jWechslerl 120041 and in preparation, 2007) 



which is designed to model relatively bright galaxies in large volume simulations. 

The underlying dark matter simulation used here tracks 10 9 particles of mass 2.25 x 
10 12 h~ 1 M Q in a periodic cubic volume with side length of 3k" 1 Gpc, using a fla t ACDM cos- 



molo gy with Q m = 0.3, er 8 = 0.9, and h = 0.7 (the Hubble Volume simulation; lEvrard et al. 



20021 ). Halos are identified for masses above 2.7 x 10 /i _1 M . Data are collected on the 
past light cone of an observer at the center of the volume. The size of the simulation enables 
the creation of a full-sky survey out to redshift of z = 0.58, and is thus suited to testing our 
cluster-finding algorithm out to high redshifts using the SDSS imaging data. 

Galaxies are connected to individual dark matter particles on this simulated light-cone, 
subject to several empirical constraints. The resolution of the simulation allows the mock 
catalog to include galaxies brighter than about 0.4L*; the number of galaxies of a given 
brightness placed within th e simulation is deter mined by drawing galaxies from the SDSS 



galaxy luminosity function (IBlanton et al.l 120031 ) . with 1.6 mags of luminosity evolution as 



sumed per unit redshift (the same assumption is made by our cluster finding algorithm). The 
choice of which particle these galaxies are assigned to is determined by relating the particle 
overdensities (on a mass scale of ~ lel3M Q ) to the two-point correlation function of the 
particles; these particles are then cho sen to reproduce the luminosity-dependent correlation 



function as measured in the SDSS by lZehavi et al.l (120041 ) . 



Finally, colors are assigned to each galaxy by measuring their local galaxy density (here, 
the fifth nearest neighbor within a redshift slice), and assigning to them the colors of a 
real SDSS galaxy with similar luminosity and local density. The local density measure for 
SDSS galaxies is taken from a volume-limited sample of the CMU-Pitt DR4 Value Added 
Catalog. This method produces mock galaxy catalogs that reproduces the luminosity and 
color correlation function of the real sky. The created mock galaxy sample therefore provides 
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a unique tool to assess the performance of the SDSS cluster-finding algorithms in terms of 
completeness and purity, as well as how the observables of the detected clusters correspond 
to dark matter halos assuming galaxy clusters do trace the underlying halo population in 
the universe. 

Since precise spectroscopic redsh ift measure ments are only available for the SDSS main 
sample galaxies (IStrauss et al.l 120021 ) and LRGs (jEisenstein et al.l 120011 ). we must use pho- 
tometric redshift estimates for most of the galaxies. In order to accurately reproduce this 
scenario in the simulations, we scatter the given redshifts of mock galaxies according to the 
error distributions of photometric redshift estimates, which are obtained by calibrating a 
sample of ~ 140,000 SDSS photometric redshifts to their known corresponding spectroscopic 
measuremen ts coming from the SDSS spectroscopic sur vey and v arious other source s such 
as CNOC2 (TYee et al.llioooh. CFR S Lilly et al.l Il995ah . DEEP JWeiner et al.l l2005h . and 
2SLAQ LRG JPadmanabhan et al.l boosh . The photometric redshifts were computed using 
a neural network technique by iLin et al.l (120061) and in preparatio n ; see also the short dis- 
cussion in the SDSS DR5 data release paper, lAdelman-McCarthyi (120071 ). The comparison 
between calculated photometric redshifts and measured spectroscopic redshifts is shown in 
Figure 1 for both the red and blue galaxy samples. The distributions of sampled redshift 
uncertainties are derived for different magnitude and redshift bins, and found to be well 
described by a combination of multiple Gaussian fits as shown in Figure 2 for examples. The 
resulted fitting parameters are used for the scattering of mock galaxy redshifts in the simu- 
lation. In the case of applying the cluster-finding technique to the real SDSS data, however, 
instead of deriving "empirical" error estimates collectively, we would use the phot o-z errors 



that are computed based on the Nearest Neighbor Error estimate method (NNE) (ILin et al. 



20061 ). which makes it possible to get an estimate of the error for each individual object. This 
would better constrain the photometric redshift uncertainty, especially for galaxy samples 
with photo-z errors depending strongly on magnitudes and the actual redshifts. We find the 
computed errors correspond reasonably well with the empirical ones derived from statistics, 
with exceptions only for the catastrophic objects. More details would be discussed in a 
subsequent paper on the application of the modified adaptive matched-filter technique with 
SDSS data. 

To summarize, the implementation of simulating the observed galaxy redshifts in the 
mock sample proceeds as follows: for galaxies that satisfy the SDSS spectroscopic target 
selection criteria we take the given galaxy redshifts as spectroscopic measurements, while 
for the rest of the sample we use the scattered redshifts to mimic the photometric redshift 
estimates. As discussed above in §2J there are a few percent of such galaxies that fall into 
the redshift and magnitude ranges where we find no good calibrations are available. For 
these galaxies we just treat them as if there is no redshift information at all to put into the 
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algorithm. We also impose to the mock galaxy catalog an apparent magnitude cut (r < 21) 
as we intend to adopt in the SDSS imaging sample. The procedure described above thus 
provides the a mock catalog with the most similar characteristics to the SDSS survey and it 
will allow us to explore the performance of the cluster-finding algorithm on real SDSS data. 

The modified matched filter algorithm is then run on the mock galaxy catalog, and the 
detected clusters are compared with matched known halos given in the simulation. We find 
that th e matches ar e gene rally robust against details of the matching techniques, as pointed 
ou t by iMiller et al. (120051 . although see also the discussion of various matching algorithms 



Rozoetal 



20071 ). Here we adopt a matching criterion of projected separation between 



m 

the detection and the candidate halo within the virial radius r2oo and redshift difference 
Az < 0.05. To evaluate completeness of the cluster sample, we match each dark halo 
to the nearest detected cluster within the projected cluster r 2 oo and Az of 0.05, while in 
measurement of purity, we match clusters to their corresponding halos applying the same 
criteria. In the case of multiple matches which are possible for above matching algorithms, 
we simply assign the most massive halo within the searching space as the real match. Other 
methods have also been tried in efforts to refine the matching process, but no significant 
changes are found in the final results. 



4. Results and Discussions 



In this section we present the results of running the modified adaptive matched-filter 
algorithm on the simulation-based mock catalogs. These include the completeness and purity 
check of the detected cluster sample, the derived cluster properties such as estimated redshift 
and richness, and the expected scaling relations that would link the observed clusters to true 
halo distributions. 



4.1. Completeness and Purity Check 

We define the completeness C of the selected cluster sample as a cumulative function of 
M 20 o, the mass within the virial radius inside which the overdensity is 200 times the critical 
density: 

C(M 200 ) = (li) 

I* total 

where Nf ouna - is the number of halos with mass greater than M 2 qo matched to clusters 
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and N to tai is the total number of halos above that mass. 

Figure 3 shows the completeness of the detected cluster sample as a function of redshift 
and the virial mass of matched dark matter halos, respectively. The cluster sample, which has 
a richness cut at A 2 oo > 20, is over 95% complete for objects with M 20 o > 2.0 x 1O 14 /i _1 M 
and ~ 85% complete for objects with masses above 1.0 x 1O 14 /i _1 M in the redshift range of 
0.05 < z < 0.45. As we will find in the subsequent discussion of cluster scaling relations, the 
richness cut we impose on the cluster sample contributes to some of the incompleteness for 
less massive objects because of the large scatter in the cluster richness-mass relation; many 
of the matched clusters at ~ 1.0 x 10 14 /i _1 Mq are simply scattered below the richness cut 
and thus not counted to compute the completeness. This can be for sure relieved by lowering 
the richness cut of the cluster sample, although we choose to stick to this cut for the purity 
considerations below. 

Also from Figure 3a, the completeness level of the cluster sample remains almost flat 
out to z ~ 0.45, beyond which it suffers a significant decline. This is at least partly due to 
the volume limit of the mock catalog which only extends to z — 0.58. When we scatter the 
given galaxy redshifts with photometric redshift errors, which become large around z ~ 0.5, 
many of the galaxies near the far edge of the light cone are scattered away while fewer 
galaxies would be shifted into that range, since they are absent from the simulation. The 
apparent magnitude cut we have applied to the mock galaxy sample may also contribute 
to incompleteness at high redshift. Taking into consideration the necessary /c-corrections, 
the galaxy sample is no longer complete down to the luminosity of 0.4L*, which is the limit 
assumed throughout the simulation tests. The matched filter therefore loses some power in 
detecting less rich systems at redshifts of z ~ 0.5 and beyond since many fewer galaxies 
would be bright enough to be observable at that distance in the current survey. We have 
not investigated these effects in detail, though the onset of clear incompleteness corresponds 
well to the distance at which they become important. 

We similarly define the purity P of the selected cluster sample as a cumulative function 
of cluster richness A 2 oo which is the total cluster luminosity in units of L* inside its virial 
radius r 2 oo 



P(A 200 ) = ^r^> (12) 

where N match is the number of clusters with richness greater than A 20 o matched to halos 
and N t ot a is the total number of clusters with richness above A 20 o- 



The results of the purity check for the obtained cluster catalog are shown in Figure 4. 
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The sample is over 95% pure for clusters with A 2 oo > 30 and around 90% pure for clusters 
with A 20 o > 20 over the whole redshift range out to z ~ 0.45. As will be shown in the 
richness-mass relationship below, these two thresholds in richness correspond to M 20 o ~ 
6.0 x 10 13 /i _1 M o and M 200 ~ 4.0 x 1O 13 /i _1 M , respectively. It is worth to be noted that 
the lower purity for A 2 oo > 20 is clearly going to be affected by halo incompleteness in 
the simulation, since some of the matched halos for this richness will fall below the mass 
resolution of the halo catalog, which means the purity we have derived above is in fact 
probably a lower limit, in similar logic to the completeness arguments. 

To ensure a reasonably high purity of selected clusters, we therefore apply a A 20 o > 20 
cut for the cluster catalog, which is used for analysis of completeness as well as cluster 
derived properties and scaling relations. The purity measurement shows a slight but notable 
uptrend in the last redshift bin of z ~ 0.45 — 0.5, which could be similarly explained by the 
arguments above in the completeness discussions. This reflects a shift in the richness-mass 
scaling relation at high redshift end where clusters with the same richness measurements may 
correspond to actually richer and more massive systems because of the under-representation 
of galaxies that are observable in that redshift range. It is therefore wise to limit the current 
cluster catalog to a redshift of z = 0.45 in order to extract a uniform sample for statistical 
use, though the catalog using real SDSS data may well go deeper reliably. 



4.2. Derived Cluster Properties and Scaling Relations 

As is discussed in £j2j for each selected cluster a redshift estimate is found for the system 
by the matched filter that optimizes the detection likelihood at the given galaxy position 
as cluster center. This measurement is then taken as the estimated redshift for the cluster. 
Since all the halos have known redshifts in the simulation, by matching the detected clusters 
to halos following the procedure described in $3] we can compare the derived cluster redshifts 
with the true redshifts of associated halos. 

Figure 5 illustrates the comparison between estimated cluster redshifts and known halo 
redshifts. For clusters with redshifts below z = 0.25 where spectroscopic redshift mea- 
surements are often available for member galaxies, the derived cluster redshift estimates 
precisely reproduce the true redshifts of corresponding dark halos. The inclusion of spec- 
troscopic information of input galaxies markedly sharpens the cluster detection likelihood in 
the line-of-sight dimension and thus provides accurate measurements of the cluster redshifts. 
In the higher redshift range where spectroscopic measurements become rare and photomet- 
ric estimates dominate, the plot illustrates a larger dispersion while the matched filter still 
gives robust determinations of cluster redshifts even with only photometric galaxy redshift 
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information for inputs. We find that the accuracy of the redshift estimates does incease 
with cluster richness as expected, which is albeit mostly accounted by higher fraction of 
cluster galaxy members with spectroscopic measurements inside these systems. There is a 
slight uptrend bias seen at the redshift of z ~ 0.45, which we see as a similar indication 
of incompleteness of the input galaxy sample near the high end of the redshift range for 
this mock catalog because of the volume limit and magnitude cut. The estimated cluster 
redshift determined from maximum likelihood tends to drift towards smaller values in some 
cases since the detection probability at higher redshift is suppressed by such effects. We also 
note the existence of a few serious outliers, which probably represent the occasional scenario 
when there exists a mismatch between relevant clusters and dark halos due to the projection 
effects or false positive detections. 

The normalized cluster richnesses A 2 oo are also compared with the virial mass M 20 o of 
matched halos. The results are shown in Figure 6. We find that the richness-mass scaling 
relation follows 



which is roughly a linear fit. Whether this is correct or not, clearly, depends upon 
the details of the simulation input, and the way the simulation was constructed gives no 
easy clue to what the results should be. What is important in this test, however, is that we 
recover what is present in the simulations, not what might or might not be present in the real 
universe. To that end, we have constructed three more plots. The first, Figure 7, compares 
the cluster richness determined by the present algorithm with the total three dimensional 
luminosity of the matched halos; the agreement is very good, with no bias evident at either 
the sparse or the rich end. Given this agreement and the results of Figure 6, the next 
plot, Figure 8, of the 3-D halo luminosity vs the 3-D halo mass, contains no surprises. 
The simulated halo mass is, in fact, linear with its total luminosity, and we recover this 
relationship. 

Figure 9 compares the derived cluster virial radius r 20 o from the cluster-finding algorithm 
and the r 2 oo determined from 3-dimensional galaxy overdensities. The agreement is excellent 
at small virial radii, though there is a strong hint that the algorithm slightly overestimates 
large virial radii, by seven percent or thereabouts. This is almost certainly due to the 
assumption of a single NFW profile to describe the cluster; neighboring halos have rather 
different effects in the cylinder to which the algorithm is sensitive and the corresponding 
sphere in the simulations, but it is gratifying that the effects are this small. These results 
further justify our choice to refer our richness measurements to the commonly-used virial 
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radius determined from galaxy overdensities. 

It is, however, clear that the scatter in the richness-mass relation derived from the cluster 
finding algorithm (Figure 6) is somewhat larger than that of the intrinsic richeness-mass 
relation in the simulations (Figure 8), which can be read as an indication of complications in 
the cluster-halo matching process, e.g., the inevitable difference between the cluster finder 
and halo finder regarding fragmentation and merging, differing shapes between the galaxy 
and mass distributions, and, even further, the variable mass-to-light ratios inside the systems 
incorporated in the current dark matter simulations. Despite these intrinsic dispersions, the 
richness-mass scaling relation shows a strong linear correspondence between the observables 
and the mass, and thus makes it possible to extract the true halo distribution in the Universe 
from the observed cluster abundance and correlation functions. It is important to note 
that the simulation from which the catalog was made is a dark-matter-only simulation, 
and thus effects which may well exist in real cluste rs and can affec t the baryon fraction in 



the intracluster gas and galaxies (see, for example, iKravtsov et al.l (120051 )) as a function of 
cluster mass are absent here, but the fact that we recover the relation found from input 3-D 
simulations, here just linear, indicates that we should be able to investigate a possibly more 
complex relationship in the real universe. 



5. Conclusions 

We present a modified matched filter algorithm which is designed to construct a com- 
prehensive cluster catalog from the Sloan Digital Sky Survey, but is applicable to any deep 
photometric survey. The technique is fully adaptive to 2-D, 2^-D and 3-D optical surveys, 
as well as to various cluster scales and substructures. 

The cluster-finding algorithm has been tested against a realistic mock SDSS catalog from 
a large N-body simulation. The results suggest that the selected cluster sample is ~ 85% 
complete and over 90% pure for systems more massive than 1.0 x 10 14 /i -1 M Q with redshifts 
up to z = 0.45. The estimated cluster redshifts derived from maximum likelihood analysis 
show small errors with Az < 0.01, and the normalized cluster richness measurements fit 
linearly with the virial mass of matched halos, the correct relation in this simulation. This 
offers hope that the (very likely nonlinear) relation between richness and halo mass which 
exists in the real universe can be investigated with these techniques. 

F.D. thanks H. Lin, H. Oyaizu, and the SDSS photo- z group for providing the photo- 
metric redshifts which allowed us to derive the statistics of the photo- z calibration to the 
spectroscopic redshifts. E.P. is an ADVANCE fellow (NSF grant AST-0649899), also sup- 
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Spectroscopic Redshift Spectroscopic Redshift 



Fig. 1. — Calculated photometric redshifts versus corresponding spectroscopic measurements 
for early type galaxies (or red galaxies, left), and late type galaxies (or blue galaxies, right). 
Here, red means g — r > 1.3 and blue means g — r < 1.3. 




Fig. 2. — Examples of multiple Gaussian fits for the error distributions of computed photo- 
metric redshifts. The derived fitting parameters are used to scatter the known redshifts of 
mock galaxies in order to simulate the practice with real SDSS data. 
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Fig. 3. — Completeness of the detected cluster sample as a function of redshift and the virial 
mass of matched halos, respectively. The sample shows a consistent completeness of > 95% 
complete for halos with M 20 o > 2.0 x lO 14 /i _1 M and is ~ 85% complete for halos with 
M 200 > 1-0 x lO 14 /i -1 M in the redshift range of 0.05 < z < 0.45. Note that the annotations 
in the figures should read /i _1 M instead of M Q . 
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Fig. 4. — Purity of the detected cluster sample as a function of redshift and the cluster 
richness, respectively. The derived catalog is over 95% pure for clusters with A 2 oo > 30 and 
around 90% pure for A 2 oo > 20 in the redshift range of 0.05 < z < 0.45. 
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Fig. 5. — Comparison between estimated cluster redshifts and known redshifts of matched 
halos. 
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Fig. 6. — Comparison between derived cluster richness and the virial mass of matched halos. 
The cluster richness A 2 oo is the total luminosity of the cluster in units of L* inside its virial 
radius r 20 o- 
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Fig. 7. — Comparison between derived cluster richness and the total luminosity of matched 
halos in units of L*. The cluster richness A 20 o is the total luminosity of the cluster in units 
of L* inside its virial radius r 2 oo- 
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Fig. 8. — Comparison between the virial mass of matched halos and their luminosities in 
units of L*. The dashed line is the best-fit cluster richness-mass scaling relation given in 
Figure 6. 
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Fig. 9. — Comparison between derived cluster virial radius r 2 oo and the halo r 2 oo determined 
by galaxy overdensities. 



